Molecular Ecology Resources — Latest Matching Preprints

1

From fluke to fragment: a multifaceted method for molecular sex identification and mitochondrial haplotyping from environmental DNA samples

Rodriguez, L. K.; Schallhart, S.; Hobmeier, P.; Curran, T.; Perez-Jorge, S.; Prieto, R.; Oliveira, C.; Silva, M. A.; Thalinger, B.

2026-05-04 genomics 10.64898/2026.04.30.719183 medRxiv

Top 0.1%

65.0%

Show abstract

O_LIEnvironmental DNA (eDNA) analyses have become a powerful tool for non-invasive biodiversity monitoring, yet the applicability of population genetic approaches to environmental samples remains largely unexplored. Even when genetic traces originate from a single individual, low target DNA concentrations and amplification or sequencing artefacts can compromise downstream genetic inferences. Here, we present a novel approach for obtaining demographic insights and lineage-level mitogenomic information from aquatic eDNA samples collected near vertebrate individuals. C_LIO_LIPaired eDNA and tissue samples were collected during sperm whale (Physeter macrocephalus) encounters in the Azores. Samples were screened for the presence of vertebrate eDNA and analyzed with a novel molecular sex identification assay. Additionally, long-range PCR was used to amplify up to five mitochondrial DNA fragments ([~]3-4k bp) before subsequent sequencing on an Oxford Nanopore Technologies platform. A stringent three-tier filtering framework capable of identifying true mitogenomic variation across eDNA samples was developed for maximum recovery of genetic diversity at the haplogroup level. By benchmarking eDNA samples via their paired tissues, parameter values were optimized to maximize concordance and minimize spurious variant calls. C_LIO_LISexing was successful for 50% of eDNA samples, with 96% concordance to paired tissues, and marine vertebrate DNA concentration significantly predicted sexing success. Further, Medaka polishing produced high identity mitochondrial consensus sequences (>16 kb) from eDNA samples. Across filtering regimes in the framework, curated SNP panels comprising up to 453 high-confidence mitochondrial SNPs resolved 19 haplogroups, with 93% concordance between eDNA and tissue samples. An intermediate bioinformatics filtering strategy maximized biologically accurate haplogroup recovery while minimizing sequencing artefacts, providing the most reliable lineage-level inferences. C_LIO_LIThis integrative approach demonstrates that targeted nuclear assays combined with long-range mitochondrial sequencing can recover individual-level genetic information from aquatic eDNA. By defining analytical thresholds governing success, the framework advances non-invasive genetic monitoring of populations via eDNA and enables population-level monitoring and conservation of endangered and genetically-vulnerable species. C_LI

2

Portable, multilocus DNA barcoding across the diversity of meiofauna

Keene, D.; Arya, S.; Walker, B.; Laumer, C. E.

2026-05-22 zoology 10.64898/2026.05.20.726206 medRxiv

Top 0.1%

61.5%

Show abstract

Molecular data have revolutionised taxonomic and ecological research on the hyperdiverse communities of aquatic benthic microinvertebrates known as meiofauna. However, reference sequence databases remain highly incomplete, with variable barcode genes or fragments studied from taxon to taxon. Furthermore, there is a typical tradeoff between universality of primers and phylogenetic resolution, with rRNA markers being robustly recoverable but failing to resolve species-level divergences, and mitochondrial markers showing the reverse trend. Here, we introduce Oxford Nanopore rRNA and COI amplicon sequencing (OrCa-seq), a rapid, low-cost protocol for parallel long-range PCR amplification and multiplexed sequencing of four amplicons, spanning the nearly-complete rRNA cistron ([~]7-8 kb) and the widely studied Folmer region of COI (represented as overlapping 313 and 658 bp amplicons). This protocol, with its associated bioinformatic workflow, was designed for conducting biodiversity inventories of meiofauna and can be easily carried out in field research and educational contexts, with data available from 96-well plates of specimens within a day of lysis. To validate the method, we processed six plates of student-isolated freshwater and limno-terrestrial meiofauna, characterising the recovery of target genes and taxa with both automated and human-curated BLAST database comparisons. These data demonstrate the universal applicability of OrCa-seq across effectively all meiofauna, including the very smallest species. Nonetheless, recovery efficiency for each amplicon shows variation by taxon, with the full-length Folmer COI amplicon standing out as the most challenging. We present exemplar phylogenetic trees integrating reference sequences, demonstrating the utility of these data in confirming morphological determinations and in identifying anonymous specimens in a reverse taxonomy context. While developed in a specific educational context for use on meiofauna, the OrCa-seq approach should be readily scalable to larger research datasets, adaptable to many specimen types, and to any combination of taxon-or target-specific primers. As such, it represents a compelling multi-locus extension to the ever-growing repertoire of nanopore DNA barcoding protocols.

3

TRIDENT (Taxonomic Resolution and IDentification using Environmental dNa Traces): An Optimized Algorithm for Vertebrate Taxonomic Assignments in eDNA Metabarcoding, Integrating Molecular, Taxonomic, and Ecological Criteria

Haderle, R.; Jung, G.; Riou, M.; Ung, V.; Jung, J.-L.

2026-07-09 molecular biology 10.64898/2026.06.29.735257 medRxiv

Top 0.1%

59.6%

Show abstract

Environmental DNA (eDNA) metabarcoding has become a powerful approach for large-scale biodiversity assessment, yet taxonomic assignment remains one of its most critical error-prone steps. Current bioinformatic pipelines rely on molecular similarity searches against reference databases, but assignment accuracy is constrained not only by short marker length and database incompleteness, but also by fundamental limitations, including recent species radiations, incomplete lineage sorting, introgression, NUMTs, and the imperfect correspondence between genetic variation and species boundaries. Here, we present TRIDENT (Taxonomic Resolution and IDentification using Environmental dNa Traces), an automated and simple protocol designed to improve taxonomic assignments in eDNA metabarcoding. Initially developed for marine vertebrates, TRIDENT may be used with any barcode and integrates three complementary sources of evidence: molecular similarity (NCBI/GenBank and BOLD), curated taxonomic information (WoRMS), and ecological plausibility derived from biogeographic occurrence data (GBIF). The workflow sequentially constructs candidate taxon lists based on sequence similarity, expands them through taxonomic hierarchies, and filters them using spatial occurrence constraints. It further identifies possible taxa lacking reference barcodes and evaluates their plausibility through CO1-based similarity if data exist in BOLD. TRIDENT has been implemented as a source-available Python tool and tested using empirical eDNA datasets from marine vertebrates as well as simulated communities. Results demonstrate that the tool produces taxonomic assignments consistent with expert manual curation while substantially reducing processing time and attention errors caused by manual processing of large datasets. By combining molecular, taxonomic, and ecological criteria within a single framework, TRIDENT improves transparency and reproducibility and provides a robust and flexible solution strengthening confidence in taxonomic identifications in eDNA-based biodiversity assessments.

4

Let the prey speak: Using PNA clamps to silence predator DNA in marine faecal diet studies

Polanowski, A. M.; Suter, L.; Deagle, B. E.; McInnes, J. C.

2026-07-08 molecular biology 10.64898/2026.06.22.733645 medRxiv

Top 0.1%

54.1%

Show abstract

DNA metabarcoding of faeces is a powerful, non-invasive method for assessing predator diets. However, when studying the diet of generalist predators, broad PCR primers are used to amplify the wide range of potential prey species and metabarcoding outputs are often dominated by sequences from the predator. While blocking primers can be used to reduce PCR amplification of predator DNA, they frequently cause partial predator suppression and unintended prey blocking. Peptide nucleic acid (PNA) clamps, offer a promising, underutilised alternative by binding strongly and selectively to predator DNA to block its PCR amplification. In this study we designed and validated a novel PNA clamp targeting the 18S rRNA gene to suppress bird and mammal predator DNA in dietary samples. We tested this clamp on tissue mixtures and faecal samples from three seabird and two seal species across temperate, subantarctic, and Antarctic regions. The PNA clamp substantially increased the proportion of prey reads recovered while maintaining consistent prey community composition across all predator species. Our results demonstrate not only the general effectiveness of PNA clamps over standard blocking primers, but also provide a powerful, broadly applicable new tool to improve the accuracy in DNA diet metabarcoding studies.

5

DipSkmer: Reference-free population genomics with diploid genome skims

Charvel, E.; Alves Monteiro, H. J.; Mirarab, S.; Bafna, V.

2026-06-08 bioinformatics 10.64898/2026.06.05.730460 medRxiv

Top 0.1%

51.9%

Show abstract

Ecologists and conservation biologists rely on genetic diversity as a key essential biodiversity variable (EBV) used to track population health and dynamics, and utilize the population parameter{theta} (estimated by the average pairwise genomic distance) as a key metric of diversity. While whole-genome-sequencing (wgs) is increasingly affordable, it will be considerable time before the full diversity of life is represented by high-quality assembled genomes; even then, constant monitoring will still require repeated sampling of populations. In contrast, genome skimming (low-coverage, short-read wgs) is highly cost-effective but challenging to analyze because the coverage is too low for assembly and reliable error correction. Mature methods, such as Mash, exist for estimating pairwise genomic distances based on the Jaccard similarity of k-mer sets computed using sketching techniques. Some, such as Skmer, additionally model the impacts of low coverage. These methods have been successfully applied to assembly-free species identification and phylogenetics; however, their use in population genetics has been limited. This is because these methods implicitly treat genomes as haploid and heterozygosity confounds true estimates of genomic distance for diploid organisms. In this paper, we address this problem through a number of technical advances. First, we use coalescent theory to mathematically derive how the Jaccard index between two diploid samples changes with the scaled population size parameter ({theta}). Next, we derive an estimator that computes{theta} from the Jaccard index, in addition to several auxiliary variables, which we also estimate from the genome skims. The resulting method, DipSkmer, enables more accurate estimates of coverage, sequencing error, and pairwise nucleotide distance for diploid samples. Analyses of both simulated and empirical datasets show that for diploids and low distances (e.g., < 2%), Dip-Skmer produces the most accurate pairwise distance estimates, outperforming existing alignment-free methods such as Mash and Skmer, and closely approximates ANGSD, a reference and alignment-based tool. AvailabilityThe code for DipSkmer is available at https://github.com/echarvel3/ReSkmer/tree/DipSkmer-REFACTOR. Simulation scripts and environments are available at https://github.com/echarvel3/dipskmer_scripts. Author SummaryThe process of obtaining full-genome population genomic measurements for biodiversity monitoring remains expensive due to the need for high-coverage sequencing and reference assemblies. Genome skimming has been shown to be a viable, low-coverage alternative for obtaining genomic distances, and alignment- and assembly-free methods exist for analyzing nuclear data from skimming data to estimate the distance between samples. However, existing methods fail to model within-sample heterozygosity, expected for diploid organisms. Given the dominance of diploidy among species of interest to ecologists, the implications of these simplifying assumptions warrant further study. Here, we present a mathematical model of the k-mer sets sampled from two diploid genomes from a Wright-Fisher population. We use the model to develop DipSkmer, a k-mer-based, reference-free method for estimating nucleotide diversity and population divergence that, unlike its predecessors, models within-sample heterozygosity. Benchmarking shows more accurate genomic diversity estimates compared to existing reference-free, genome-skimming methods and comparable performance to the popular high-coverage, reference-based method, ANGSD. Thus, DipSkmer enables accessible, less expensive population monitoring through genetic diversity estimates.

6

Low-Coverage Genome Sequencing Outperforms Target Enrichment Phylogenomics

Branstetter, M. G.; Freitas, F. V.; Benavides Silva, L. R.; Bossert, S.; Danforth, B. N.; Murray, E. A.

2026-06-09 evolutionary biology 10.64898/2026.06.05.730492 medRxiv

Top 0.1%

45.2%

Show abstract

Genome-scale data have transformed phylogenetic inference, yet most studies continue to rely on reduced-representation approaches that target a subset of loci to reduce cost and increase taxon sampling. Although effective, these methods require specialized laboratory workflows, constrain long-term data reuse, and may perform poorly with degraded DNA. Low-coverage whole genome sequencing (lcWGS) offers a streamlined alternative: shallow to moderate sequencing of complete genomes followed by bioinformatic extraction of loci of interest. Despite its promise, lcWGS has not been rigorously benchmarked against targeted enrichment using historical museum specimens. Here, we directly compared lcWGS and ultraconserved element (UCE) target enrichment across taxonomically diverse bee specimens collected between 1934 and 2021. Both data types were generated from the same Illumina libraries, enabling a controlled, head-to-head evaluation. Using standard UCE analytical pipelines, we quantified locus recovery, gene-tree support, and phylogenetic performance across sequencing methods and specimen age classes. We further assessed recovery of additional marker classes, including mitogenomes, BUSCO loci, and UCEs from a newly-designed, expanded probe set. Across all age categories, lcWGS consistently outperformed target enrichment, recovering more UCE loci and substantially longer alignments, with the largest gains observed in highly degraded specimens. Gene trees derived from lcWGS exhibited higher mean bootstrap support and greater topological concordance, translating into improved species-tree inference. In addition, lcWGS enabled recovery of markedly more non-target loci, expanding analytical flexibility beyond the original marker set. These results demonstrate that lcWGS not only matches but frequently exceeds the performance of targeted enrichment in museum-based phylogenomics, while providing broader genomic utility. As sequencing costs continue to decline, lcWGS represents a robust and forward-looking strategy for phylogenetic research, particularly in taxa with modest genome sizes and challenging DNA quality.

7

PhaseWY: A pipeline for haplotype phasing, sex chromosome identification and extraction of sex-limited sequences

Ellerstrand, S. J.; Churcher, A. M. J.; Kutschera, V. E.; Hansson, B.

2026-06-22 bioinformatics 10.64898/2026.06.17.732863 medRxiv

Top 0.1%

34.4%

Show abstract

Sex chromosomes are central to many ecological and evolutionary processes. Evidence has accumulated that sex chromosome systems vary extensively in age, turnover and transitions, motivating renewed efforts to study the diversity of sex chromosome systems across the tree of life. However, successful genomic detection of sex chromosomes depends on several factors, including the size and divergence time, background genetic diversity, and the number of sequenced females and males. In addition, technical challenges associated with sequencing and analysing the sex-limited Y/W chromosome remain. Here, we present PhaseWY, an automated Snakemake pipeline that uses whole-genome sequencing data from multiple female and male individuals to identify sex-chromosomal regions and extract the corresponding Y/W sequences. PhaseWY (i) detects sex differences in alignment depth, (ii) applies read-based and statistical haplotype phasing, (iii) identifies sex-linked regions using haplotype clustering, and (iv) subsets autosomal, X/Z- and Y/W-linked variants for downstream analyses. We applied PhaseWY to simulated data to benchmark factors influencing sex-linkage detection and successful extraction of Y/W-linked variants. To demonstrate its practical utility, we further applied PhaseWY to the neo-sex chromosome system in Alauda larks (Alaudidae) and performed a range of downstream analyses demonstrating the scope of applications of the PhaseWY output. We conclude that PhaseWY provides an easy-to-use and reproducible tool for population-genomic analyses in non-model organisms, with particular importance for advancing our understanding of sex-chromosome evolution.

8

A new method based on genome alignments provides a highly resolutive target enrichment set for weevils (Coleoptera, Curculionoidea)

ZELVELDER, B.; BENOIT, L.; LOISEAU, A.; HARAN, J.; ALLIO, R.

2026-05-13 evolutionary biology 10.64898/2026.05.09.724036 medRxiv

Top 0.1%

33.9%

Show abstract

Target enrichment methods have provided unprecedented advances in phylogenomics. Targeting hundreds of conserved regions has proven to be a good tradeoff between cost and efficiency, while being useful for museomics and diversified non-model clades. Unfortunately, current methods used for identifying such regions involve high degrees of conservation within targeted elements, usually pushing researchers to rely on flanking data with little guarantee for homology. With a growing number of high quality genomes available throughout the Tree of Life emerges new opportunities to improve marker selection. In this study, we introduce GABBI, a new method for designing target capture probes by taking advantage of genome alignments, avoiding the selection of a single reference genome that can cause notable biases. We compare GABBI-derived markers to the most commonly used probe design method, PHYLUCE, at two taxonomic scales, the weevil superfamily Curculionoidea and the tribe Pachyrhynchini. At both taxonomic scales, results show that our new method allows identifying more variable loci that prove to be more phylogenetically resolutive than the PHYLUCE-derived ones. Doing so, we provide the first probe set specifically designed for weevils, targeting a wide set of 4,255 shared homologous regions, encouraging future research on systematics and macroevolution of one of the most diverse and economically important groups of insects. By providing GABBI as an automated and open-access pipeline, we hope to open new probe design opportunities to other taxonomic groups that face similar phylogenetic obstacles.

9

Molecular Star Gazing: Development and Validation of an Environmental DNA Assay for the Imperiled Sunflower Sea Star (Pycnopodia helianthoides)

Gold, Z.; Robinson, K. M.; Gehman, A.-L. M.; Shea, M. M.; Lemay, M. A.; Weinrich, J.; Kellogg, C. T. E.; Clemente-Carvalho, R. B. G.; Schiebelhut, L. M.; Boehm, A. B.; Kidd, A.; Kim, A.; Hodin, J.; Dawson, M.; McAllister, S. M.

2026-05-12 molecular biology 10.64898/2026.05.07.723600 medRxiv

Top 0.1%

22.2%

Show abstract

The sunflower sea star (Pycnopodia helianthoides) suffered a catastrophic population decline across its range from 2013 to 2017 due to the devastating Vibrio pectenicida FHCF-3 driven sea star wasting disease (SSWD) pandemic with minimal signs of population recovery. The functional extinction of this apex predator across substantial parts of its range has created a need to identify and track the remaining intact populations. Environmental DNA (eDNA) approaches provide a simple, cost-effective, and non-destructive method for monitoring occurrences, and in some cases abundances, of marine species, consistently outperforming visual occurrence monitoring efforts in sensitivity, speed, and cost. Here, we designed, developed, and validated a P. helianthoides-specific eDNA assay to identify refugia, using both quantitative and digital droplet PCR approaches. We first generated the most comprehensive sea star mitochondrial genome reference database to date (n=93 taxa, n= 15 novel). We then used unikseq and Geneious bioinformatics software to identify the unique nad5 gene region and design a highly specific hydrolysis probe-based PCR assay. We validated the performance of this assay through laboratory, mesocosm, and field testing, demonstrating a highly specific and sensitive assay. In a field application of the new assay across regions in British Columbia, Canada, we found a positive correlation between P. helianthoides eDNA concentrations and biomass density, especially when appropriately accounting for spatiotemporal integration scales (R2=0.67). The eDNA assay provides a rapid and scalable tool for monitoring the sunflower sea star which has been proposed for listing as threatened under the U.S. Endangered Species Act of 1973. Molecular tools like the one presented here enhance management and recovery efforts not only by identification and monitoring of remnant wild populations, but also by helping to assess population level response and recovery following reintroduction efforts.

10

kinference: Pairwise kinship detection for Close-Kin Mark-Recapture

Bravington, M. V.; Baylis, S. M.; Eveson, P.; Feutry, P.

2026-05-21 genetics 10.64898/2026.05.18.725841 medRxiv

Top 0.1%

21.8%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWClose-Kin Mark-Recapture (CKMR) is a statistical framework for estimating demographic parameters of wild populations. Instead of recapturing individuals, it relies on the identification of closely-related pairs such as parents and offspring, or siblings. By measuring how often such close-kin are "recaptured" among sampled animals (whether alive or dead), scientists can estimate demographic parameters such as census size, mortality rates, and connectivity. CKMR is starting to change fisheries and wildlife management by giving more reliable demographic information, even for many species that resist conventional approaches. Here we introduce the kinference R package, which provides a set of tools for finding close-kin pairs among thousands of samples each genotyped at thousands of SNPs, and for associated quality control. The CKMR context implies different requirements and assumptions to many other kinship programs. In particular, kinference accounts empirically for linkage without requiring a genome assembly, is able to estimate and control false-negative and false-positive probabilities, and can cope with null alleles. The package has been developed and used in numerous CKMR projects since 2017. This paper documents the assumptions, statistical algorithms, and intended workflow for kinference.

11

Towards the reliable use of aerial eDNA for ecosystem monitoring

Sokal, N.; Urbez-Torres, J. R.; Da Ros, L.

2026-05-21 genomics 10.64898/2026.05.19.726284 medRxiv

Top 0.1%

19.5%

Show abstract

Evidence supporting the use of airborne eDNA for biodiversity studies and ecosystem monitoring is growing. The promise of wide-area population dynamics data for downstream applications in targeted monitoring of pests and pathogens for agriculture and rare species for conservation is appealing; however, several technical challenges persist. Here, we focused on the development of a comprehensive dataset to facilitate assay development and accelerate the use of aerial sampling for species detection. Year-round metabarcoding data was generated using bacterial, fungal, plant, and arthropod primer sets and resulted in relative abundance estimates for 4,960 amplicon sequence variants (ASVs), 1,748 ASVs of which were assigned to a minimum taxonomic level of genus (bacteria, fungi, plants) or class (arthropods). Sequence diversity assessments and seasonal clustering based on presence/absence detection patterns were performed for individual ASVs, while discerning quantitative changes in seasonal abundance required grouping ASVs to at least the genus level. Examination of the technical aspects of metabarcoding suggested that the use of subsampling allows for consistent detection of genera with relative abundance values above 2 %, even when samples have varying sequencing depths. Sequencing depth was the primary determinate for detecting sporadic and/or rare ASVs. Sampler comparisons, common sources of variation, and the benefits of barcoding regional species to supplement the existing taxonomic databases were discussed. Insufficient knowledge of sampler coverage area for the different organism types was identified as a limitation to the deployment of aerial monitoring networks. Considerations for further aerial metabarcoding efforts are suggested based on our experimental findings. ImportanceOur study deals directly with the generation, analysis and limitations of airborne eDNA metabarcoding data for re-use by the broader environmental research community. This includes timing of seasonal detection for possible genera of interest across multiple kingdoms, including bacteria, fungi, plants and animals (specifically arthropods), and support for the generation of local databases to assess the current limitations of universal primers for species/genus taxonomic resolution. With regards to methodology, it continues to build upon established best practices for airborne eDNA collection in areas such as sub-sampling and sampling replicates, sampler type and sequencing depth. To accelerate possible uptake and application of the data, we provide the identified ASVs and their seasonal relative abundances as a resource.

12

Shark sexing from forensic, archival, and developmental samples using sex-linked DNA markers

Akane, O.; Kawaguchi, Y. W.; Niwa, T.; Uno, Y.; Kuraku, S.

2026-05-06 ecology 10.64898/2026.05.02.722412 medRxiv

Top 0.1%

19.0%

Show abstract

The effective management of threatened shark populations relies on accurate demographic data, particularly operational sex ratios. While sex identification in intact shark bodies is straightforward through the presence of external male organs, namely claspers, it remains impossible for processed fins in the illegal wildlife trade, early-stage embryos in breeding programs, or archived tissue fragments and blood samples where morphological traits are lost. Here, we present a robust molecular sexing framework leveraging recently identified sequences from shark sex chromosomes, consistently organized in the XY system, to our current knowledge. Our approach consists of two distinct methodologies tailored to the the current identification status of sex chromosome sequences in the target species. For the whale shark Rhincodon typus and the brownbanded bamboo shark Chiloscyllium punctatum, we employed end-point PCR assays targeting male-specific Y-linked markers. For the cloudy catshark Scyliorhinus torazame, we developed a quantitative PCR (qPCR) assay targeting differential X chromosome dosage. In this dosage-based system, females (XX) are distinguished by an amplification profile approximately one cycle earlier than males (XY). By integrating X-linked dosage quantification, our framework provides a critical internal control that significantly enhances reliability, allowing researchers to distinguish true females from PCR failures. This toolkit offers a versatile solution for diverse applications, ranging from the study of sex determination mechanisms in pre-phenotypic embryos to the reconstruction of sex ratios from space-constrained tissue archives and global wildlife forensics, thereby contributing to the comprehensive conservation of shark biodiversity.

13

The old pipe gives the sweetest smoke: A phylogenetic turn for eDNA metabarcoding

Haderle, R.; Ung, V.; Jung, J.-L.

2026-06-15 genetics 10.64898/2026.06.11.731524 medRxiv

Top 0.1%

18.7%

Show abstract

Environmental DNA (eDNA) metabarcoding has transformed biodiversity monitoring, yet most analyses rely on taxonomic metrics that are sensitive to methodological variation and limit cross-study comparability. We propose a "phylogenetic turn" in eDNA analysis through the integration of phylogenetic diversity (PD) metrics. By incorporating evolutionary relationships, PD reduces dependence on species-level resolution, increases robustness to detection biases, and better captures the evolutionary "option value" of biodiversity. We synthesize key PD metrics across richness, divergence, and regularity, emphasizing the use of standardized effect sizes (SES) for ecological interpretation while addressing challenges in metric selection. We apply this framework to five marine eDNA datasets (2021-2025) spanning ecologically and geographically contrasting ecosystems, from tropical to Arctic regions, and encompassing a wide gradient of anthropogenic pressure. Across datasets, we identify consistent patterns: anthropized ecosystems exhibit high taxonomic richness but reduced phylogenetic diversity, indicating phylogenetic clustering, whereas less disturbed systems show lower richness but greater evolutionary breadth. These findings demonstrate that PD reveals ecological structure not captured by taxonomic metrics, including signatures of environmental filtering and community assembly processes. By providing a reproducible analytical workflow based on standardized eDNA datasets, we position phylogenetic diversity as a critical bridge between eDNA data and conservation frameworks. Ultimately, eDNA-based phylogenetic approaches open new avenues for decoding global biodiversity patterns across heterogeneous ecosystems.

14

Charting the insect biodiversity of Crete: insights from a pilot metabarcoding survey

Koutsovoulos, G. D.; Sorg, M.; Hörren, T.; Buchner, D.; Bourlat, S. J.; Langen, K.; Trichas, A.; Leese, F.; Stamatakis, A.

2026-06-08 ecology 10.64898/2026.06.05.730060 medRxiv

Top 0.1%

18.6%

Show abstract

Among eukaryotes, insects are by far the most diverse organisms on Earth, yet their global decline threatens ecosystem stability. Understanding local and regional biodiversity patterns is critical for conservation planning, ecosystem management, and predicting responses to environmental change, but traditional surveys for assessing insect diversity (e.g., manual collection, morphological identification, and counting) are highly labor-intensive, time-consuming, and often require rare or simply unavailable dedicated taxonomic expertise. DNA metabarcoding offers an efficient, high-resolution alternative to assess insect communities. Here, we report on the first insect metabarcoding survey on Crete that spans two years of sample collection between 2021 and 2023 from a small area in Southern Central Crete in the context of a citizen science project. A total of 29 samples yielded 10,865 Exact Sequence Variants (ESVs), 10,516 of which were assigned to insects, covering 988 species, 900 genera, and 227 families across 14 orders. A comparison with the existing observation records reveals 406 potential newly-observed species and an estimated 690 unclassified species, indicating substantial cryptic diversity. Our results demonstrate that even small-scale sampling can unravel substantial insect diversity and highlight critical gaps in barcode reference databases. Our study demonstrates how DNA metabarcoding can accelerate biodiversity discovery and monitoring in understudied regions.

15

A genomic tool to tackle cryptic diversity demonstrates the potential for off-target use of GT-seq panels

Ackiss, A. S.; Vinson, M. R.; Ropp, A. J.; Gruenthal, K. M.; Krabbenhoft, T. J.; Siegel, J. V.; Stott, W.; Yule, D. L.; Larson, W. A.

2026-06-12 genomics 10.64898/2026.06.09.731139 medRxiv

Top 0.1%

18.3%

Show abstract

A comprehensive understanding of life history is vital to successful species conservation and management. When different life history stages are accompanied by considerable morphological or cryptic variation, such as the egg and larval phases exhibited by most fishes, genomic tools are essential for identifying species so that early-life ecology questions can be studied. Genotyping-in-thousands by sequencing (GT-seq) has recently emerged as a targeted and efficient approach for species identification. We leveraged existing genomic and transcriptomic data to develop a GT-seq panel capable of differentiating the members of the Coregonus artedi complex, a radiation of salmonids in the Laurentian Great Lakes whose members are indistinguishable with mitochondrial DNA barcoding loci and are the focus of bi-national conservation initiatives. Our panel of 494 loci was able to assign fishes in the C. artedi complex to species and lake. We examined cross-amplification in other coregonines with overlapping distributions and found that congeneric Lake Whitefish (C. clupeaformis) cross-amplified at 94% of loci and confamilial Round and Pygmy Whitefish (Prosopium spp.) cross-amplified at 42% and 38% of loci, respectively. We adapted bioinformatic probes to account for Prosopium-specific variants including 22 new SNPs and developed a whitelist of 428 SNPs capable of distinguishing these whitefishes. Finally, we demonstrated performance by identifying 3,066 coregonine larvae and juveniles collected in spring 2019-2021 from Lake Superior. These results hold promise for future insights into the species-specific ecology of early life coregonines and demonstrate the flexibility of GT-seq panels, which may cross-amplify hundreds of informative genome-wide loci in related taxa.

16

Validated microsatellite markers for Gyrodactylus salaris: a toolkit for individual identification and genetic studies

Aisala, H.; Hansen, H.; Lumme, J.

2026-04-24 genomics 10.64898/2026.04.22.719836 medRxiv

Top 0.1%

18.3%

Show abstract

Microsatellite markers remain essential for individual-level genetic work in taxa where genome-wide methods are not yet routinely feasible due to extremely low DNA yields per specimen. In Gyrodactylus, even the most recent reference genomes have required pooling thousands of individuals, leaving a practical gap between genome-scale resources and individual-level analyses. Here we present a genome-informed microsatellite panel, developed by selecting single-copy loci with non-repetitive flanking regions and assembling all markers into a single multiplex PCR. Marker identity and performance were verified via amplification tests, Sanger sequencing, and cross-laboratory genotyping, confirming that the same samples generated identical fragment-size profiles in both laboratories. Long tandem repeats occasionally prevented exact repeat-count determination, yet allele-size classes were discrete and reproducible across replicates. The panel enables rapid individual identification and reliable strain and lineage assignment. It also offers a practical starting point for population-genetic and evolutionary studies that require individual-level data.

17

MycorrhizaTracer: A BIOINFORMATIC PIPELINE FOR FUNGI AND PLANT CLASSIFICATION OF SANGER DNA SEQUENCES

Brekke, T. D.; Weeks, T.; Barber, R. A.; Thomson, I.; Gooda, R.; Gargiulo, R.; Delhaye, G.; Andrew, C.; Kowal, J.; Bidartondo, M.; Martinez-Suz, L.

2026-04-27 bioinformatics 10.64898/2026.04.23.720352 medRxiv

Top 0.1%

14.9%

Show abstract

Processing Sanger DNA sequences remains a routine yet technically demanding step in many biodiversity and ecological studies, particularly when barcoding large numbers of environmental samples. Manual inspection and editing of trace files, DNA sequence alignment, and classification using taxonomic reference databases is time-consuming, inconsistent, and prone to error. These challenges are compounded in studies involving degraded samples, in-house DNA sequencing, under-described taxa, or when investigators have limited access to computational tools. We present MycorrhizaTracer, an open-source, fully automated pipeline for processing and taxonomically classifying large batches of Sanger sequencing chromatograms. We have optimized it for fungal and plant taxa, but it is adaptable across the tree of life. The pipeline performs quality trimming, consensus generation from bidirectional reads, taxonomic classification via BLAST, clustering, optional salvaging of low-quality sequences, and functional annotation of fungal taxa. Designed for scalability and ease of use, MycorrhizaTracer can process thousands of DNA chromatograms in a matter of hours without the need for an HPC. Accuracy and ecological relevance are ensured by features such as gene region-specific taxonomic filtering and sequence-based clustering of unclassified reads. By streamlining trace-to-taxon workflows, MycorrhizaTracer reduces the burden of manual curation, supports reproducibility, and enables efficient recovery of biodiversity data from Sanger sequences - particularly in field-based or resource-limited research contexts.

18

An evaluation of clustering and assembly strategies from Iso-Seq data in the absence of reference genomes in non-model animals

Eleftheriadi, K.; Vazquez-Valls, M.; Fernandez, R.

2026-07-08 evolutionary biology 10.1101/2025.09.18.677004 medRxiv

Top 0.1%

14.7%

Show abstract

Transcriptome assembly enables the recovery of expressed genes and isoforms, but the optimal strategy for reconstructing transcriptomes from long-read sequencing remains unresolved. In particular, establishing best practices for generating accurate gene models and selecting representative isoforms is essential for comparative genomics, as for orthology inference typically only the longest isoform per gene model is included. Here, we systematically compare clustering and de novo assembly methods using PacBio Iso-Seq data from 17 animal lineages spanning seven phyla, most of them non-model species, with the goal of investigating which methodology is more adequate to select one isoform per gene model, in the absence of specific pipelines to do so. We evaluate four approaches: isoseq cluster, CD-HIT, RNA-Bloom2 and isONform. We benchmark them with short-reads using Trinity, assessing assembly quality with BUSCO completeness, short-read mapping rates, coding sequence recovery, and longest isoform prediction. Our results show that CD-HIT clustering at high similarity thresholds ([≥]99%) yields the most complete and coding-rich long-read transcriptomes, rivaling Trinity while avoiding its high redundancy. Consensus-based methods such as isoseq cluster and isONform recover fewer single-copy orthologs (mirrored in a lower BUSCO score) and achieve lower mapping rates, while RNA-Bloom2 provide intermediate performance with reduced duplication. Together, these findings establish, to date, CD-HIT as a robust and practical strategy for transcriptome reconstruction from long-read data when genomic references are unavailable. By benchmarking de novo methods across a taxonomically broad dataset, this work defines the realistic capabilities of long-read transcriptome reconstruction in the absence of a reference genome and provides practical guidance for deriving high-quality gene models and selecting representative isoforms for orthology inference in non-model species.

19

Complementary Insights from Environmental DNA and Environmental RNA Metabarcoding for Marine Biodiversity Assessment Around San Andres Island, Colombia

Bedingfield, S. K.; Vanegas Moreno, C.; More, A. F.

2026-06-08 genetics 10.64898/2026.06.03.730006 medRxiv

Top 0.1%

12.7%

Show abstract

Environmental DNA (eDNA) metabarcoding has become a cornerstone of marine biodiversity monitoring, yet it recovers genetic material irrespective of organism viability and may therefore conflate historical and contemporary community signals. Environmental RNA (eRNA), derived from less stable ribonucleic acid, is hypothesized to be biased toward metabolically active organisms and may provide a more temporally resolved snapshot of living communities. Here we present a paired eDNA/eRNA metabarcoding comparison across a tropical marine seascape, analyzing 19 co-sampled sites spanning coral reefs, mangroves, a seagrass bed, shipwrecks, a cenote, and coastal infrastructure around San Andres Island, Colombia. To our knowledge this is the first in situ, ecosystem-scale paired eDNA/eRNA survey of the broad eukaryotic community across multiple natural habitat types in a tropical marine system, extending mesocosm and freshwater work (e.g., Giroux et al., 2022) to a field setting. Using COI-region amplicon sequencing processed by NatureMetrics, we recovered 1,944 operational taxonomic units (OTUs) across the 19 paired sites. Of these, 1,015 (52.2%) were detected by both approaches, 305 (15.7%) were unique to eDNA, and 624 (32.1%) were unique to eRNA. The eRNA-unique fraction was taxonomically enriched for groups including diatoms (class Bacillariophyceae, phylum Ochrophyta), ciliates, and other protists. Paired Wilcoxon signed-rank tests showed that eRNA recovered significantly higher OTU richness (median 239 vs. 207; W = 36, p = 0.016) and Shannon diversity (median 3.64 vs. 3.38; W = 40, p = 0.026) than eDNA. The mean per-site Jaccard similarity between paired samples was 0.40, indicating substantial turnover in the rare-taxon composition recovered by each method. Principal coordinates analysis of Bray-Curtis dissimilarity showed that habitat type structured abundance-weighted community composition (PERMANOVA F = 2.49, p = 0.001) whereas molecular method did not (F = 1.37, p = 0.107). A PERMDISP test found homogeneous multivariate dispersion between methods (F = 0.01, p = 0.92), reinforcing the absence of a method effect, but significant dispersion heterogeneity among habitats (F = 24.0, p < 0.01), so the habitat result is interpreted with caution. Indicator species analysis identified 73 OTUs significantly associated with one template: eDNA indicators were dominated by dinoflagellates (Dinophyceae) and eRNA indicators by diatoms (Bacillariophyceae) and fungi, consistent with an eRNA bias toward metabolically active microbial eukaryotes. A read-weighted overlap analysis showed that although eRNA-unique OTUs outnumbered eDNA-unique OTUs roughly two to one, the large majority of reads (>95%) fell in shared OTUs, so method-unique detections are predominantly rare taxa. We discuss the complementary value of eRNA for marine monitoring, with the seagrass habitat -- where eRNA reduced masking by terrestrial plant material -- as the clearest use case, and propose, rather than prescribe, the integration of eRNA into routine programs.

20

Optimal Reference Panel Design in Ancient DNA Imputation from Coalescent Theory, Simulation, and Real Data Application with an Ancient Reference Panel

Sousa da Mota, B.; Kumar, K.; Reich, D. E.; Zoellner, S.

2026-04-28 genomics 10.64898/2026.04.27.721163 medRxiv

Top 0.1%

12.6%

Show abstract

Imputation is widely used in the ancient DNA (aDNA) field to determine which phenotypically important alleles ancient individuals carried, to study natural selection, and to detect segments of the genome that are shared between individuals identical by descent. However, rare variant imputation is less accurate, and rare variants tend to be excluded from downstream analyses. State-of-the-art imputation methods leverage large reference panels, improving rare variant accuracy in modern targets. However, it is unclear how to identify optimal panels for aDNA targets. It seems plausible that aDNA reference panels would improve imputation of aDNA, but no such panels have been assembled or tested. We leveraged analytical results from coalescent theory and complementary simulations to evaluate both performance of large modern panels, and ancient panels impact on aDNA imputation. For modern panels, sample sizes as small as 5,000 saturate imputation performance and model misspecifications in standard imputation algorithms increase imputation error for rare and intermediate frequency variants. For instance, for European hunter-gatherers, non-reference imputed variants with derived allele frequency less than at least 2% should be removed. Including ancient genomes in a modern reference panel substantially improved imputation accuracy in analytical modelling and simulations, particularly, for rare variants and older samples from groups with low effective population size. We assembled a joint reference panel with 1000 Genomes and 95 ancient samples and used it to impute 95 downsampled genomes, finding modest gains in imputation performance. This approach can rescue rare variants typically discarded from current imputation pipelines and may prove useful as the number of ancient samples increases.